Self-Adjusting Reinforcement Learning
نویسندگان
چکیده
We present a variant of the Q-learning algorithm with automatic control of the exploration rate by a competition scheme. The theoretical approach is accompanied by systematic simulations of a chaos control task. Finally, we give interpretations of the algorithm in the context of computational ecology and neural networks.
منابع مشابه
Self-tuning experience weighted attraction learning in games
Self-tuning experience weighted attraction (EWA) is a one-parameter theory of learning in games. It addresses a criticism that an earlier model (EWA) has too many parameters, by fixing some parameters at plausible values and replacing others with functions of experience so that they no longer need to be estimated. Consequently, it is econometrically simpler than the popular weighted fictitious ...
متن کاملHow an Adaptive Learning Rate Benefits Neuro-Fuzzy Reinforcement Learning Systems
To acquire adaptive behaviors of multiple agents in the unknown environment, several neuro-fuzzy reinforcement learning systems (NFRLSs) have been proposed Kuremoto et al. Meanwhile, to manage the balance between exploration and exploitation in fuzzy reinforcement learning (FRL), an adaptive learning rate (ALR), which adjusting learning rate by considering “fuzzy visit value” of the current sta...
متن کاملRobot reinforcement learning accuracy-based learning classifier systems with Fuzzy Policy Gradient descent(XCS-FPGRL)
This paper presented a novel approach XCS-FPGRL to research on robot reinforcement learning. XCS-FPGRL combines covering operator and genetic algorithm. The systems is responsible for adjusting precision and reducing search space according to some reward obtained from the environment, acts as an innovation discovery component which is responsible for discovering new better reinforcement learnin...
متن کاملA Self-Adaptive Proposal Model for Temporal Action Detection based on Reinforcement Learning
Existing action detection algorithms usually generate action proposals through an extensive search over the video at multiple temporal scales, which brings about huge computational overhead and deviates from the human perception procedure. We argue that the process of detecting actions should be naturally one of observation and refinement: observe the current window and refine the span of atten...
متن کاملSelf-organizing state aggregation for architecture design of Q-learning
This work describes a novel algorithm that integrates an adaptive resonance method (ARM), i.e. an ART-based algorithm with a self-organized design, and a Q-learning algorithm. By dynamically adjusting the size of sensitivity regions of each neuron and adaptively eliminating one of the redundant neurons, ARM can preserve resources, i.e. available neurons, to accommodate additional categories. As...
متن کامل